Parallel multiscale modeling of biopolymer dynamics with hydrodynamic correlations 
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We employ a multiscale approach to model the translocation of biopolymers through nanome- 
^ , ter size pores. Our computational scheme combines microscopic Molecular Dynamics (MD) with 

O ■ a mesoscopic Lattice Boltzmann (LB) method for the solvent dynamics, explicitly taking into ac- 

' count the interactions of the molecule with the surrounding fluid. We describe an efflcient parallel 

implementation of the method which exhibits excellent scalability on the Blue Gene platform. We 
investigate both dynamical and statistical aspects of the translocation process by simulating poly- 
mers of various initial conflgurations and lengths. For a representative molecule size, we explore 
I I , the effects of important parameters that enter in the simulation, paying particular attention to the 

. strength of the molecule-solvent coupling and of the external electric fleld which drives the translo- 

l' cation process. Finally, we explore the connection between the generic polymers modeled in the 

' ' simulation and DNA, for which interesting recent experimental results are available. 

g: 

O ! I. INTRODUCTION 

o , 

Q . Biological systems exhibit a complexity and diversity much richer than the simple solid or fluid systems traditionally 
• ^ ' studied in physics or chemistry. The powerful quantitative methods developed in the latter two disciplines to analyze 
the behavior of prototypical simple systems are often difficult to extend to the domain of biological systems. Advances 
(— I in computer technology and breakthroughs in simulational methods have been constantly reducing the gap between 
quantitative models and actual biological behavior. The main challenge remains the wide and disparate range of spatio- 
temporal scales involved in the dynamical evolution of complex biological systems. In response to this challenge, 
^ \ various strategies have been developed recently, which are in general referred to as "multiscale modeling". These 
^ , methods are based on composite computational schemes in which information is exchanged between the scales, in 
' either a sequential or a concurrent manner 
\^ . We have recently developed a multiscale framework which is well suited to address a class of biologically related 
problems. This method involves different levels of the statistical description of matter (continuum and particle) and 
^ is able to handle different scales through the spatial and temporal coupling of a mesoscopic fluid solvent, based on 
I ■ the lattice Boltzmann method 0, 3 (LB), to a coarse-grained particle level, which employs explicit molecular 
I \ dynamics (MD). The solvent dynamics does not require any form of statistical ensemble averaging as it is represented 
C . through a discrete set of pre-averaged probability distribution functions, which are propagated along straight particle 

■ trajectories. This dual fleld/particle nature greatly facilitates the coupling between the mesoscopic fluid and the 
J> , atomistic level, which proceeds seamlessy in time and only requires standard interpolation/extrapolation methods 

■ for information-transfer in physical space. Full details on this scheme are reported in Ref. LB and MD with 
rN ' Langevin dynamics have been coupled before Q , but our implementation involves such coupling for long molecules of 

' biological interest. In addition, we have recently developed a parallel version of the code and successfully ported it to 
■ ■ ' the IBM-BlueGene architecture. 

Motivated by recent experimental studies, we apply this multiscale approach to the translocation of a biopolymer 
through a narrow pore. This type of biophysical process is important in phenomena like viral infection by phages, 
inter-bacterial DNA transduction or gene therapy Q. In addition, it is hoped that this type of process will open a 
way for ultrafast DNA-sequencing by sensing the base-sensitive electronic signal as the biopolymer passes through 
a nanopore with attached electrodes. Experimentally, translocation is observed in vitro by pulling DNA molecules 
through micro-fabricated solid state or membrane channels under the effect of a localized electric fleld Q. From a 
theoretical point of view, simplifled schemes and non-hydrodynamic coarse-grained or microscopic models 
are able to analyze universal features of the translocation process. This, though, is a complex phenomenon involving 
the competition between many-body interactions at the atomic or molecular scale, fluid-atom hydrodynamic coupling, 
as well as the interaction of the biopolymer with wall molecules in the region of the pore. A quantitative description 
of this complex phenomenon calls for state-of-the art modeling, towards which the results presented here are directed. 
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II. COUPLING BETWEEN LATTICE BOLTZMANN AND MOLECULAR DYNAMICS 



In this section we outhne the simulation method used to couple the Lattice Boltzmann description for the solvent 
to the Molecular Dynamics description of the solute. In the LB method the basic quantity is fi{x,t), representing 
the probability of finding a "fluid particle" at the spatial mesh location x and at time t with discrete speed We 
emphasize that the "fluid particles" do not correspond to individual physical particles such as water molecules; they 
are simply an effective medium for representing the collective motion of such physical particles. Once the discrete 
distributions fi are known, the local density, flow speed and momentum-flux tensor of the fluid are obtained by a 
direct summation upon all discrete distributions: 

p(f,t) = (1) 



pu{x, t) fi{x, t)ci (2) 

i 

'P{x,t)^^f,{x,t)cA (3) 

i 

where, for the present study, the standard three-dimensional 19-speed lattice is used 01 . The fluid populations are 
advanced in time through the following evolution equation: 

+ QAt, t + M)= , t) + LoM{f, - f^^){x, t) + F,At + S,At (4) 

The discrete velocities ci connect mesh points to first and second topological neighbors, therefore the fiuid particles 
can only move along the links of a regular lattice and the synchronous particle displacements Axi = CiAt never take 
the fluid particles away from the lattice. The right hand side of Eq. ^ represents the effect of fluid-fluid molecular 
collisions, through a relaxation towards a local equilibrium, f^'^, typically a second-order expansion in the fluid velocity 
of a local Maxwellian with speed u: 

13"^ 1 ^ 

= [(3u ■Ci + — {uu- {ciCi - -pi)] (5) 

where (3 = l/fcsT is the inverse temperature, Wi a set of weights normalized to unity, and / is the unit tensor in 
cartesian space. The relaxation frequency uj controls the kinematic viscosity of the pure fluid: 



1 At 
uj ~2 



The Fi term is a stochastic force needed to inject fluctuations in the fluid at the level of fluctuating hydrodynamics, 
which is local in space and time, and is given by 

F., = F^^ {cm - ksTSab) + Pallg^abc} (6) 

a,b a,b,c 

where F^'^^ is a fluctuating stress tensor satisfying 

{Fil\x,t)F^f{x\t')) = lM:A,,,,<5(f - f )(5(i - t') (7) 

and Aabcd is a fourth-order Kronecker symbol. By construction, the fluctuating stress tensor does not affect the fluid 
mass and momentum conservation. Moreover, due to the discrete nature of the LB scheme, and in order to recover 
the fluctuation-dissipation theorem at flnite wave vectors, noise also acts at the level of the non-hydrodynamic modes 
carried by the LB method via a fluctuating heat flux i^'^^ , coupling through a suitable basis in kinetic space, g (see 
ref. for full details). 

The source term Si accounts for the presence of atomic-scale particles embedded in the LB solvent; we use here 
the term particles again, because in the general case those may represent individual atoms or collections of atoms 
(molecular units), or even coarse-grained descriptions of atomic motion, but still at the atomic length scale (a few 
A to a few nm) . S'^ is a back- reaction representing the sum of momentum and momentum-flux input per unit time 
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due to the influence of atomic-scale particles on the fluid population fi. By deflnition, the back-reaction does not 
change the density of the fluid, so that J2i '^i — 0- Momentum and momentum flux conservation of the solute and 
solvent systems imply that 

5]5.(f)c- + (8) 



The source term then reads 



S^ = -yj^/3[Ff{x) + F^ix)]-c, (9) 



so that it explicitly satisfles Eq. |(8]). In these equations, and F^ , are the friction and random forces, respectively. 

Before describing the MD part, we emphasize that the LB solver is particularly well suited to the problem at hand 
for several reasons: First, free-streaming of the fluid proceeds along straight trajectories which greatly facilitates 
the imposition of geometrically complex boundary conditions, such as those required to describe the membrane and 
nano-pore. Second, fluid difi^usivity emerges from the flrst-order LB relaxation-propagation dynamics, so that the 
kinetic scheme can march in time-steps which scale only linearly with the mesh resolution. Third, since collisions are 
completely local, the LB scheme is ideally suited to parallel computing. These features make the LB the method of 
choice compared to other available methods, such as Stokesian dynamics, which typically scale superlinearly with the 
number of particles. 

We next describe the MD section of the method, bearing in mind that the embedded solute has a molecular topology, 
such as DNA, where a linear collection of A^o beads (each bead or solute particle representing a collection of atoms or 
molecules) compose the polymer. The solute particles are advanced in time according to the following MD equations 
for the positions rp and velocities Vp 

dfp 

m-^ ^ Fp +Fp +Fp +Fp , p^l,No (10) 
where the forces on the right-hand side are given by 

= -E%^(i^^f-^^?i) (11) 

f/ = jiup - Up) (12) 

4'' = drp,t) (13) 

The flrst term represents the conservative particle-particle interactions, V{r) being a standard 6 — 12 Lennard- Jones 
potential. 



V{r) = 4e 



(7)" -(7)' 



(14) 



with an effective cut-off at Tc = 2^/^(t for the radial part, plus a harmonic potential for angular degrees of freedom, 
Vangiff) = "2 ' ^ith (j) the relative angle between two consecutive bonds, to account for distortions of the dihedral 
angles. Typical values of a and e in our simulations are 1.8 and lO"**, respectively. The second term on the right- 
hand-side of (fTO|) represents the mechanical friction between a single particle and the surrounding fluid, Vp being the 
particle velocity and Up = u(fp) the fluid velocity, evaluated at the particle position. In addition to the mechanical 
drag, the particles feel the effects of stochastic fluctuations of the fluid environment through the random term ^(fp, t), 
which is a Gaussian random noise obeying the fluctuation-dissipation relations, 

<^{rp,t) > = 

< arp,tmr,, n > = ^^^(^P - - t') (15) 

Finally, Fp corresponds to the bonding forces acting between particles with labels p and p-l- 1 of the polymeric chain. 
The bonding forces can be modelled as arising from a rigid constraint that flxes the bond length to a constant value 
ro. In this case, 
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Fp =EAfc%afc (16) 
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is the reaction force resulting from the constraint 

CTfe = \rk+i - Hep -r^ = (17) 

and {A} is the set of Lagrange multipHers {Nq — 1 in the case of a Hnear polymer of length Nq) that depend instan- 
taneously on the particle positions and momenta. The Lagrange multipliers are evaluated based on the numerical 
scheme used to propagate in time the equations of motion. The linear dependence of the dynamics from the Lagrange 
multipliers requires the inversion of a linear matrix which depends on the position and momenta of all particles com- 
posing the polymer. Such direct inversion can be avoided through the well-known SHAKE method |13|, an iterative 
procedure which solves the algebraic problem to a given accuracy. The SHAKE method, however, becomes rather im- 
practical in a parallel architecture, since at each iteration it requires frequent exchange of data, therefore representing 
a bottleneck in terms of scalability. 

As an alternative, which is particularly well suited for the parallel implementation, bonding can be modelled by 
harmonic forces: 

= -dr-V" (18) 
V = Ey[I'^+i--"*''-|--oF (19) 

k 

In order to reduce the additional polymer flexibility due to such forces, a rather high value of the force constant 
Kr can be chosen. The basic difference with the constrained dynamics is the fact that harmonic bonding introduces 
fast oscillations which can render the numerical scheme unstable at large timesteps. Typically, such modes carry 
frequencies up to two orders of magnitude higher than those relative to non-bonding forces. To take into account such 
oscillations, a small integration time step must be used which would make the simulation highly inefficient. On the 
other hand, as described in the following, a multiple time step algorithm makes it possible to achieve basically the 
same computational efliciency for the constrained and unconstrained MD schemes. 

Clearly, in the LB approach all quantities have to reside on the lattice nodes, which means that the frictional and 
random forces need to be extrapolated from the particle to the grid location. This is obtained by extracting the 
fluid velocity fleld Up at the nearest grid point from each particle position and, similarly, assigning these forces to the 
feed-back on the fluid population through the same simple recipe. We have found that this procedure is as accurate 
as a more involved bilinear interpolation/extrapolation scheme for the exchange of forces and momentum. Details on 
this scheme have been pubHshed previously (see Ref. [3|). 

The numerical solution of the stochastic equations is achieved through the Stochastic Position Verlet (SPV) scheme, 
as introduced in Ref. [l3|, a propagator which is second order accurate in time. Owing to the presence of velocity- 
dependent and stochastic forces, standard deterministic integrators, such as the Verlet one, would give rise to flrst order 
accuracy of the resulting trajectory. The original SPV method needs to be modifled in the presence of constraining 
forces. Positions and momenta are advanced in time according to 







dt ^ 
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= fp 
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(SHAKE) 
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(RATTLE) 








rp 







i^p(r=*^)+A/'[^(l-e-27dt)] 
m 

(20) 

where A/'[cr^] is a gaussian random number of zero mean and variance , and 

Fp = F^ + -fup + F; (21) 

and flnally, the SHAKE procedure [l3| is employed to satisfy constraints, {(j{r)} = 0, while the RATTLE procedure 
[iB l imposes the set of conditions v)} = 0. Clearly, the computational efi^ort of the SPV integrator is the same 
as in standard MD, where the conventional Verlet algorithm is usually employed, by computing only once and storing 
the exponential factors appearing in eqs. I|20p . When considering the momentum exchange with the solvent, the 
corrected velocities appear in the friction forces. Moreover, during the MD sub-cycle, the hydrodynamic fleld is frozen 
at time t = nAt. 

For the translocating polymer, the MD solver is marched in time with a fraction of the LB time-step, dt — At/M 
and the timestep ratio M is chosen to be 5. 
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When dealing with the harmonic bonds a multiple time step integrator is employed by introducing a nested 
sub-cycle over a timestep dt^ = dt/M^ , as follows: 







dt^ 




= rp + 


^Vp 




= Vp + 










Vp 


= Vp + 




Tp 


- rp + 


dt^ 





^ „-tdt- I, rp 

m7 ^ m 



A/^ cycle 



(22) 

The multiple time step solver is now marched in time with timestep ratios M — 5 and = 20, providing accurate 
results in terms of stability and unbiased statistical averages, as verified by monitoring the system average temperature 
which remains equal to the preset value. More details on the method and the efficiency of each of the schemes involved 
(MD and LB) are given in Ref. 0. 



III. CODE PARALLELIZATION 



We have recently developed a parallel version of the LB-MD multiscale code and ported it to the IBM Blue- 
Gene architecture. This development provides a boost in computational efficiency by an order of magnitude in 
both solvent and solute degrees of freedom, making possible simulations in which one MD particle corresponds to 
one DNA base-pair. This level of resolution at the lowest scale of the multiscale approach enables the concurrent 
handling of hydrodynamics with chemical specificity. In this section, we provide the essential features of the parallel 
implementation of the LB-MD code. 

The parallelizations of the Lattice Boltzmann Equation method and of the Molecular Dynamics method, separately, 
have been extensively studied for a number of years [H, [13, E E, H [m, m. However, the coupHng of these 
techniques poses new issues that need to be addressed in order to achieve scalability and efficiency for large scale 
simulations. We addressed these issues starting from a serial version of the combined code, instead of trying to combine 
two existing parallel versions. We chose MPI as the communication interface since it offers high portability among 
different platforms and allows good performance due to the availability of highly tuned implementations. For the two 
sections of the code, we employed a parallelization technique that entails a sort of "run-time" pre-processing. For the 
LB part of the code, this initial stage can be summarized as follows. Each node of the LB lattice is labeled with a tag 
that identifies it as belonging to a specific part of the computational domain ( e.g., fiuid, wall, boundary, etc.), as read 
from an input file. In this way it is possible to use the code for different problems without recompiling it. At first, a 
subset of nodes is assigned to each computational task, attempting to balance the number of nodes per task as much 
as possible (obviously in some cases this operation cannot be done exactly). The assignment takes into account the 
domain decomposition strategy that can be one-, two- or three-dimensional. All seven possible combinations (that is, 
decompositions along X, Y, Z, XY, XZ, ZY, XYZ) are supported. The decomposition strategy can be chosen at run 
time as well. Having assigned the nodes to the tasks, the pre-processing phase begins. Basically, each task globally 
checks which tasks own the nodes to be accessed during the subsequent phases of simulation, for the streaming part 
of the LB algorithm. Such information is exchanged by using MPI collective communication primitives so that each 
task knows the neighboring peer for send/receive operations. Information about the type and size of data to be 
sent/received is exchanged as well. 

In the LB scheme there are several parts in which data are exchanged: i) streaming; ii) handling of periodic boundary 
conditions; iii) presence of refiecting or absorbing walls within the computational domain. For each of these cases 
we adopt the following approach: the receive operations are posted in advance by using corresponding non-blocking 
MPI primitives, then the send operations are carried out using either blocking or non-blocking primitives depending 
on the parallel platform in use (unfortunately, few platforms allow real overlapping between communication and 
computation). Then, each task waits for the completion of its receive operations using the MPI wait primitives. The 
last operation, in the case of non-blocking send, is to wait for their completion. The evaluation of global quantities 
(e.g., the momentum along the X,Y,Z directions) is carried out by using MPI collective communication primitives 
of reduction. This parallelization scheme works fine for the LB part of the code, provided that the computational 
domain remains fixed (that is, each node maintains the same tag), otherwise a new pre-processing step is required. 
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In absence of particles embedded in the fluid, the performance of the parallel LB section is very good, as shown in 
Table m which gives the results for two large lattices obtained using 512 and 1024 processors of the IBM BlueGene 
parallel system [23|- Actually, excellent scalability is achieved even for a much smaller lattice, as reported in Table [TTl 



Lattice size 


Time (sec) 
512 tasks 


Time (sec) 
1024 tasks 


512 X 256 X 256 
1024 X 512 X 512 


71 
584 


35 
286 



Table I: Times (in seconds) for 100 LB time-steps using 512 and 1024 IBM BlueGene processors 



Number of tasks 


Time (sec) 


Number of nodes per task 


Efficiency (%) 


32 


69.6 


65536 




128 


17.4 


16384 


100 


512 


4.38 


4096 


99.4 


1024 


2.32 


2048 


93.8 



Table II: Times for 100 LB time-steps on a 128 x 128 x 128 lattice using the IBM BlueGene. Note that 32 is the minimum 
number of tasks with the system configuration used for this test. 

For the Molecular Dynamics section, a parallelization strategy speciflc to the multi-scale method in use must be 
developed. The main problem, given the highly non-homogeneous distribution of particles for typical multi-scale 
applications, which is particularly true in the case of biopolymer translocation through a nanopore, is that a strong 
load imbalance is expected (one task may have been assigned hundreds of particles whereas a neighboring one may 
have no particles at all); therefore, a careful partitioning of the particles is necessary. The usual optimal choice for 
MD is a domain decomposition strategy, where as a first option, the parallel sub-domains coincide with the ones of the 
LB scheme. In this way, each computational task performs sequentially both the LB and MD calculations, avoiding 
the communication costs arising from a functional decomposition. Alternatively, a more MD-specific decomposition, 
which considers only the regions of the spatial domain populated by particles, could be a better choice in terms of 
stand-alone MD. With this second option, however, the exchange of momentum among particles and the surrounding 
fiuid becomes a non-local operation, possibly with long-range point-to-point communications from the viewpoint of 
the underlying hardware platform, and a consequently unacceptable communication cost. 

We have thus opted for the first strategy, such that the interaction with the fiuid is completely localized (there is 
no need to exchange data among tasks during this stage). However, since this subdivision gives rise to a strong load 
imbalance, we resort to a dynamic load balancing algorithm, as outlined in the following. 

At first, during the pre-processing step a subset of particles is assigned to the computational tasks. As the sim- 
ulation proceeds, particles migrate from one domain to another and particle coordinates, momenta and identities 
are re-allocated among tasks. Non-bonding forces between intra- and inter-domain pairs of particles, involving the 
communication of particle positions between neighboring tasks, are computed. Moreover, the molecular topology is 
taken into account by exchanging details on the molecular connectivity, in order to compute bonding forces locally. In 
this respect, the way parallel MD is designed is rather standard [23| and will not be described in detail. As a simple 
add-on to the standard procedure, each task carries out the exchange of momentum with the fiuid. 

The dynamic load balancing strategy impacts the computation of the non-bonding forces, representing the most 
CPU demanding part of MD. At first, the strategy requires a communication operation between neighboring tasks 
that tracks the number of particles assigned to the neighbors. If a task has a number of particles that exceeds by a 
predefined threshold the number of particles assigned to one neighbor, it sends to that neighbor part of the exceeding 
particles, so that local load balancing for the computation of non-bonding forces is achieved. A set of precedence 
rules prevents situations in which a task sends particles to a neighbor and receives particles from another. The 
task that receives particles from a neighbor computes the corresponding non-bonding forces and sends back these 
quantities to the task that actually owns the particles. Since the cost of the computation of non-bonding forces grows 
(approximately) as the square of the number of particles in each domain, the communication/computation ratio is 
favorable, so that we obtain, even with relatively few particles, a reasonable efficiency, as confirmed by the results 
reported in Table ITTTl 

This level of efficiency allows one to think of the possibilities that lie ahead in this type of simulation, where one 
will soon have at their disposal IBM Blue-Gene systems with ~ 10^ processors. With this computational power, we 
estimate that it will be possible to simulate 4 • 10^ particles for 10® time-steps corresponding to approximately 1/isec 
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Number of tasks 


Time (sec) 


Efficiency (%) 


32 


230.4 




128 


84.9 


67.8 


512 


27.9 


51.6 


1024 


14.2 


50.7 



Table III: Times (in seconds) for 100 iterations of a 4000 particles biopolymer translocation in a 128 x 128 x 128 lattice. Note 
that 32 is the minimum number of tasks with the system configuration used for this test. 

over a region of about 1 fim in one day. One can then actually think of modelling atomistic simulation in the realm of 
nearly-macroscopic time-scales. In such an approach, the break-even point for chemical specificity, where each bead 
would map one base-pair, could be reached, marking the hand-shaking point with a new generation of multiscale 
codes. This can be accompHshed through using specific potentials in the code to account for the molecular specificity 
of the different base-pairs. However, the scientific problem, even with one base-pair per bead, of deciding how to 
coarse grain the various atoms in the base-pair will remain an open challenge. 

IV. NUMERICAL SET-UP 

In our simulations we use a three-dimensional box of size x Nx/2 x 7Vj:/2 in units of the lattice spacing Aa:. 
The box contains both the polymer and the fiuid solvent. The former is initialized via a standard self- avoiding 
random walk algorithm and further relaxed to equilibrium by Molecular Dynamics. At time zero, the first bead of the 
polymer is placed at the vicinity of the pore. The solvent is initialized with the equilibrium distribution corresponding 
to a constant density and zero macroscopic speed. Periodicity is imposed for both the fluid and the polymer in all 
directions. A separating wall is located in the mid-section of the x direction, at x/Ax — Nx/2, with a square hole of 
side h — 3Ax at the center, through which the polymer can translocate from one chamber to the other. For polymers 
with up to No — 400 beads we use = 80; for larger polymers — 100. At t = the polymer resides entirely in 
the right chamber at x/Ax > Nx/2. 

Translocation is induced by a constant electric force (Fdrive) which acts along the x direction and is confined in 
a rectangular channel of size 3 Ax x Aa; x Ax along the streamwise {x direction) and cross-fiow {y,z directions). 
The solvent density and kinematic viscosity are 1 and 0.1, respectively, and the temperature is ksT = 10"'*. All 
parameters are in units of the LB timestep At and lattice spacing Ax, which we set equal to 1. Additional details 
have been presented in Ref. In our simulations we use Fdrive = 0.02 and a friction coefficient 7 = 0.1. It should 
be kept in mind that 7 is a parameter governing both the structural relation of the polymer towards equilibrium 
and the strength of the coupling with the surrounding fluid. With our parametrization, the process falls in the fast 
translocation regime, where the total translocation time is much smaller than the Zimm relaxation time. 

In order to interpret our results in terms of physical units, we map the semiflexible polymers used in our simulations 
with the DNA typical persistence length. With three lattice sites for a 12 nm large hole, the lattice site is Anm. With 
this mapping, our pore size is close to the experimental pores which are of the order of 12 nm and our MD particles 
correspond to about 30 base pairs, reasonably close to the persistence length of A-phage double-strand DNA 50 nm) . 
The fact that the polymers modelled here have a persistence length of about 10 monomers is also confirmed analytically 
by approaching our polymers as worm like chains. Moreover, the polymers presented here correspond to DNA lengths 
in the range 0.8 — 20 kbp whereas the DNA lengths used in the experiments are larger (up to ~ 100 kbp); the current 
multiscale approach can be extended to handle these lengths, assuming that appropriate computational resources are 
available. 



A. Translocation statistics 

Extensive simulations of a large number of translocation events over 100 — 1000 initial polymer configurations for 
each length confirm that most of the time during the translocation process the polymer assumes the form of two 
almost compact blobs on either side of the wall: one of them (the untranslocated part, denoted by U) is contracting 
and the other (the translocated part, denoted by T) is expanding. Snapshots of a typical translocation event shown 
in Fig. [1] firmly support this picture. 

The variety of different initial polymer realizations produce a scaling law dependence of the translocation times on 
length We construct duration histograms by accumulating all events for each length. The resulting distributions 
deviate from simple gaussians and are skewed towards longer times (see Fig. [2]). Hence, the translocation time for 
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49% 
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Figure 1: Snapshots of a typical event: a polymer (A'o = 300) translocating from the right to the left is depicted at various 
stages of the process. The numbers shown in each panel represent the fraction of the polymer that has already passed through 
the pore. The vertical dots in the middle of each panel represent the membrane wall. 




10000 



20000 



30000 



time (At) 



Figure 2: Duration histograms for polymers of different lengths. The distributions are skewed and the most probable time from 
these distributions is used to construct the scaling shown in the inset (cyan points). The scaling for the case without a fluid 
solvent is also shown (black points). 



each length is not assigned to the mean, but to the most probable time, which is the position of the maximum in the 
histogram. By calculating the most probable times for each length, a superlinear relation between the translocation 
time T and the number of beads A^o is obtained, as shown in Fig.[2l[inset). The exponent in the scaling law t{No) ^ Nq 
is calculated to be a '~ 1.28±0.01, for lengths up to A^o = 500 beads. The observed exponent is in very good agreement 
with a recent experiment on double-stranded DNA translocation, that reported a ~ 1.27 ± 0.03 [l^l- This agreement 
makes it plausible that the generic polymers modeled in our simulations can be thought of as DNA molecules. Without 
hydrodynamics translocation is significantly slowed down as indicated by a scaling exponent a' ~ 1.36 ± 0.03; data 
without the presence of a solvent are also presented in Fig. [21 
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Figure 3: Radii of gyration as functions of the number of beads (left panel) and time (right panel) as an average over about 
100 events for A'o = 300. Blue is the untranslocated part Ru, red the translocated part Rt and black the effective radius Reff, 
defined in Eq. ^3^. 



B. Translocation dynamics 

We next turn to the dynamics of the biopolymer as it passes through the pore. A radius of gyration Ri{t) (with 
/ = U,T) is assigned to each part of the polymer, the untranslocated and translocated one; we can also define an 
effective radius through 

i?e// = [i?^+i?^]^/'^ (23) 

with u 0.6, which applies to a self-avoiding random walk. At the end of the translocation process, the radius 
of gyration is considerably smaller than it was initially: Rritx) < i?£/(0), where tx is the total translocation 
time for an individual event. Taking averages over a few hundreds of events for A^o = 200 beads showed that 
Xr = RT{tx)/ Ru{0) ~ 0-7. This reveals that as the polymer passes through the pore it becomes more compact than 
it was at the initial stage of the event, due to incomplete relaxation. In Fig. [3l we represent both radii of gyration 
as functions of the number of beads and of the translocation time. The curves shown in the figure are averages over 
about 100 events for A^q = 300. By definition, Ru{t) vanishes &it = tx, while Rt increases monotonically from t = 
to t = tx , although it never reaches the value Ru{t = 0). The rates of change of the two radii are approximately 
equal and constant (see left panel of Fig. [H), except at the end points of the event where the radius of gyration itself 
is not a well defined quantity because of the small number of beads included. The effective radius Reff shows a very 
small decreasing slope, in other words, it is almost constant (right panel of Fig. [3]). 

Throughout its motion the polymer continuously interacts with the fluid environment. The forces that essentially 
control the process are the electric drive Fdrive,i and the hydrodynamic drag Fdrag,i which act on each bead, for the 
untranslocated {I = U) and translocated {I — T) parts.. The drag forces F^^ag,! on both parts of the polymer, change 
in time. In Fig. [U both drag forces are presented as a function of time, together with their sum and the value of 
the electric drive in the pore, Fdrive = qE{Nr), where {Nr) is the average number of resident monomers in the pore 
(simulations provide A^^ ~ 4.2). This flgure, referring to an average over the entire ensemble of trajectories for a 
300-monomer molecule, shows that the untranslocated strand alone can by no means balance the external force; only 
the sum of the translocated and untranslocated drag forces comes close to balancing the drive. 

Apart from these forces and speciflcally at the end points (initiation and completion of the passage through the 
pore) entropic forces become important. The fluctuations experienced by the polymer due to the presence of the fluid 
are correlated to these entropic forces which, at least close to equilibrium, can be expressed as the gradient of the 
free energy with respect to the fraction of translocated beads. At the final stage of a translocation event, the radius 
of the untranslocated part undergoes a visible deceleration (see Fig. [3l), the majority of the beads having already 
translocated. It is, thus, entropically more favorable to complete the passage through the hole rather than reverting 
it, that is, the entropic forces cooperate with the electric field and the translocation is accelerated. 

The entropic forces can also lead to rare events, such as retraction, which occur in our simulations at a rate less than 
2% and depend on length, initial polymer configuration and parameter set. A retraction event is related to a polymer 
that anti-translocates after having partially passed through the pore. We have visually inspected the retraction events 
and associate them with the translocated part entering a low-entropy configuration (hairpin-like) subject to a strong 
entropic pull-back force from the untranslocated part: The translocated part of the polymer assumes an elongated 
conformation, which leads to an increase of the entropic force from the coiled, untranslocated part of the chain. As a 
result, the translocation is delayed and eventually the polymer is retracted. 
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Figure 4: Viscous drag on the untranslocated and translocated parts of the polymer as a function of the number of beads and 
for an average of 100 translocations events with A'o = 300 beads. The horizontal line denotes the electric drive in the pore 
region. 



C. Translocation work 



As a final step, we study the work performed on the biopolymer throughout its translocation. On general grounds, 
hydrodynamic interactions are expected to minimize frictional efi^ects and form a cooperative background that assists 
the passage of the polymer through the pore. We investigate the cooperativity of the hydrodynamic field through the 
work (Wh) made by the fiuid on the polymer per unit time: 

^=7(E^^*W-"'W> (24) 

i 

Through this definition, positive values of this hydrodynamic work rate indicate a cooperative effect of the solvent, 
while negative values indicate a competitive effect by the solvent. The work We done per timestep by the electric 
field on the polymer can also be easily obtained through the expression: 

^^{J2^1r^ve,^■m) (25) 

i 

The brackets in the above equations denote averages over different realizations of the polymer for the same length. 
The results for the averages over all realizations are qualitatively similar to the work rates for an individual event of 
the same length. For all lengths studied here, we found that the total work per timestep of the hydrodynamic field 
on the whole chain is essentially constant, as shown in Fig. [5] for the averages over all events at each polymer length. 
For all these cases, the electric work rates are also constant with time. The hydrodynamic work per time is larger 
than the corresponding electric field work, because the latter only acts in the small region around the pore, and this 
is the reason why the electric field work is independent of polymer length. 

In addition to the variation of the work rates with time it is useful to analyze their distributions during translocation 
events. We show these in Fig. [U where it is evident that the distribution of the hydrodynamic work rate lies entirely 
in the positive range, indicating that hydrodynamics turns the solvent into a cooperative environment, that is, it 
enhances the speed of the translocation process. In the same figure, the distribution for the electric work rate over 
all events for all lengths considered is also shown; these distributions are mostly positive but have a small negative 
tail which indicates that beads can be found moving against the electric field. 



V. CONCLUSIONS 



We have presented a new multiscale methodology based on the direct coupHng between atomic or molecular scale 
particle motion and mesoscopic hydrodynamics of the surrounding solvent. Due to the particle-like nature of the 
mesoscopic lattice Boltzmann solver, this coupling proceeds seamlessly in space and time. Correlations between the 
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Figure 5: Translocation work per unit time for the hydrodynamic (solid lines) and electric fields (dotted lines) for polymers of 
different lengths. The right panel shows the probability distributions of both work rates. The labels correspond to the number 
of monomers A'o for each case. 

atomic scale and the hydrodynamic scale are also explicitly included through direct and local interactions between the 
particles representing solvent motion and the those belonging to the polymer. As a result, hydrodynamic interactions 
between the polymer and the surrounding fluid are explicitly taken into account, with no need of resorting to non-local 
representations. This allows a state-of-the-art modeling of biophysical phenomena, where hydrodynamic correlations 
play a significant role. We have also described an efficient parallel implementation of the method which exhibits 
excellent scalability on the IBM BlueGene platform. 

As an application we modelled the translocation of polymers, which resemble DNA, through nanometer-sized pores. 
Besides statistical properties, such as scaHng exponents, the present methodology affords direct insights into the details 
of the dynamics as well as the energetics of the translocation process, thereby offering a very valuable complement 
to experimental investigations of these complex and fascinating biological phenomena. It also shows a significant 
potential to deal with problems that combine complex fiuid motion and dynamics at the molecular scale. 
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